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We  outline  our  research  findings,  achieved  during  the  last  three 
years  under  the  auspices  of  the  above  grant,  on  the  topics  of 

(i)  Stochastic  control  under  partial  observations; 

(ii)  Dynamic  Allocation  and  Multi-armed  Bandit  Problems; 

(iii)  Deterministic  and  anticipative  aspects  of  stochastic 

optimization; 

(iv)  Predictable  representation  properties  for  semimartingales. 

(v)  The  stochastic  version  of  Pontryagin's  maximum  principle; 

(vi)  Representation  of  additive  functionals  of  Markov  processes; 


On  the  first  topic,  we  focused  on  Adaptive  Control  Problems  of 
the  Bayesian  type  which  can  be  formulated  equivalently  as  stochastic 
control  problems  with  partial  observations.  Roughly  speaking,  one 
tries  to  control  the  state  of  a  system  to  a  desired  goal  by  optimiz¬ 
ing  a  certain  performance  index  in  the  presence  of  unobservable  para- 
parameters  in  the  plant  equations  and  subject  to  random  disturbances. 
These  parameters  are  modelled  by  random  variables  with  known  "prior" 
distributions.  A  natural  question  arises  then:  can  one  obtain  optim¬ 
al  control  laws  in  this  setup  by  simply  plugging  least-squares  estim¬ 
ates  of  the  unobserved  parameters  into  the  formulae  for  the  optimal 
laws,  which  are  valid  for  the  system  with  all  the  parameters  known? 
This  "certainty-equivalence"  (or  separation)  principle  is  well-known 
to  hold  for  linear-quadratic-gaussian  systems. 

A  significant  thrust  of  our  research  has  focused  on  exploring  the  va¬ 
lidity  of  this  very  appealing  principle  for  other,  more  complicated 
dynamics  and  cost  structures,  as  well  as  for  constrained  control  sets 
(cf.  Benes  et  al.  (1991),  Karatzas  &  Ocone  (1992),  (1993)).  The  anal¬ 

ytical  problems  one  encounters  are  quite  challenging,  and  have  led  us 
to  the  resolution  of  very  interesting  questions  in  the  theory  of  ran¬ 
dom  processes  and  in  fully  nonlinear  partial  differential  equations 
of  the  second  order. 


The  so-called  Multi-armed  Bandit,  or  Dynamic  Allocation,  prob¬ 
lem,  on  the  other  hand,  "is  important  as  one  of  the  simplest  nontriv¬ 
ial  situations  on  which  one  must  face  a  conflict  between  taking  acti¬ 
ons  which  yield  immediate  reward,  and  taking  actions  (such  as  acquir¬ 
ing  information,  or  preparing  the  ground)  whose  benefit  will  come  on¬ 
ly  later"  (P.  Whittle  (1980)).  It  has  become  a  classic,  not  least 
through  the  pioneering  work  of  J.  Gittins  and  his  collaborators 
who  managed  to  crack  open  the  infinite-horizon,  discrete-time,  Marko¬ 
vian  version  of  this  problem.  In  the  papers  El  Karoui  &  Karatzas 
( 1993a, b)  we  studied  the  general  problem  of  optimal  stopping  in  such 
detail  as  to  permit  a  very  simple  probabilistic  resolution  of  the  ge¬ 
neral  (non-Markovian)  discrete-time  dynamic  allocation  problem,  in 
the  spirit  of  Gittins-Whittle.  We  then  extended  these  methodologies, 
in  order  to  deal  with  the  far  mode  difficult  continuous-time  version 
of  this  problem,  in  El  Karoui  &  Karatzas  (1994),  (1995).  We  obtained 

a  very  simple  resolution  of  the  problem  in  the  special  case  of  "decr¬ 
easing  rewards",  and  used  refined  tools  from  the  stochastic  analysis 
of  multi-parameter  processes,  stopping  times  and  martingales,  to  red¬ 
uce  the  general  case  to  this  simple  one,  via  the  lower-envelopes  of 
Gittins  index  processes.  We  also  explored  the  implications  of  this 
approach  in  the  study  of  diffusion  processes  with  singular  drift 
coefficients,  such  as  "skewed  Brownian  motions". 


In  the  paper  Davis  &  Karatzas  (1994)  we  offered  a  determinist¬ 
ic  approach  to  the  problem  of  optimal  stopping,  which  reduces  this 
difficult  problem  to  pathwise  maximization  for  a  modified  reward  pro¬ 
cess.  The  modification  is  described  in  terms  of  a  process  L(.)  that 
plays  the  role  of  a  Lagrange  multiplier  (enforcing  the  non-anticipat- 
ivity  constraint  corresponding  to  the  definition  of  stopping  time) 
and  which,  at  any  given  time,  is  given  as  L(t)  =  M(T)  -  M(t) ,  where 
T  is  the  terminal  time,  and  M(.)  is  the  martingale  in  the  Doob-Meyer 
decomposition  Z=M-A  for  the  Snell  envelope  Z(.)  of  the  reward  process 
Y ( . ) .  This  idea  works  in  both  discrete-  and  continuous-time;  in  the 
former  case,  it  yields  a  very  simple  proof  for  the  famous  "prophet 
inequality"  of  Krengel  &  Sucheston.  The  doctoral  student  I.  Pikovsky 
is  working  on  extending  this  idea  to  more  general  stochastic  control 
problems,  and  on  its  connections  with  anticipative  stochastic 
analysis  and  optimization  (cf.  Accardi  &  Pikovsky  (1994)  for  a  very 
general,  unified  approach  to  non-anticipative  stochastic  integration, 
based  on  ideas  of  Quantum  Probability  and  on  the  Malliavin  Calculus; 
and  Pikovsky  &  Karatzas  (1994) ,  for  additional  aspects  of  stochastic 
control  with  side-information,  or  "anticipation") . 


Much  of  our  research  in  optimal  stopping  and  stochastic  control 
has  been  the  subject  of  considerable  interest  here  at  Columbia,  on 
the  part  of  our  strong  graduate  students.  Prompted  by  this,  we  have 
offered  advanced  doctoral  courses  on  the  Probabilistic  Aspects  of 
Optimal  Stopping  and  Control  and  tried  to  compile  as  complete  a 
collection  of  Lecture  Notes  as  was  possible  (Karatzas  (1993) ;  our 
hope  is  that  they  will  form  the  nucleus  of  a  monograph  or  textbook. 


4-  j  4.A^8n?  a  dlfferent  tack,  we  studied  with  the  former  doctoral 
student  Abel  Cadenillas  the  Stochastic  Maximum  Principle  for  control 
control  problems  with  linear  dynamics,  general  random  (adapted)  coef- 
^nd. convex  but  quite  singular  cost  criteria;  see  Cadenillas 
(1992),  Cadenillas  &  Karatzas  (1995).  Without  imposing  Lp-bounds  on 
the  controls  we  solved  explicitly  for  the  associated  adjoint 

a?d  succeeded  in  obtaining  integral  and  local  versions  of 
the  stochastic  maximum  principle;  these  led,  in  turn,  both  to 
necessary  and  sufficient  conditions  for  optimality  of  a  control. 

The  version  of  the  stochastic  maximum  principle  in  these  references 
^-s  the  only  one,  to  our  knowledge,  that  covers  the 

£°£?Y™£^0”/invest™®nt  Problem-  When  applied  to  problems  like  the 
d_Mlss  the  b?-near“Regulator,  it  gives  results  stronger 
than  those  possible  with  previous  versions  of  the  principle. 


^■ia-HQT$e  v°rk,?L^e  fo™er  graduate  student  and  post-doctoral  asso- 
Xue  (I994)  9n  the  martingale  representation  property  for  the 
filtration  of  two  independent  semimartingales,  is  an  outgrowth  of  his 

StSJmr?i'qq4fSeft^1°n-h?f®  at  Columbia-  Finally,  the  paper  Hoehnle  & 
?£  £  the  vlslting  research  associate  R.  Hoehnle,  extends, 

fSnllilSs  of  ?LPBSse?e|ro;es^er  °£  X‘  XUe  °n  the  ^^dltive 
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